A Flexible Unsupervised PP-Attachment Method Using Semantic Information

نویسندگان

  • Srinivas Medimi
  • Pushpak Bhattacharyya
چکیده

In this paper we revisit the classical NLP problem of prepositional phrase attachment (PPattachment). Given the pattern V −NP1−P−NP2 in the text, where V is verb,NP1 is a noun phrase, P is the preposition and NP2 is the other noun phrase, the question asked is where does P −NP2 attach: V or NP1? This question is typically answered using both the word and the world knowledge. Word Sense Disambiguation (WSD) and Data Sparsity Reduction (DSR) are the two requirements for PP-attachment resolution. Our approach described in this paper makes use of training data extracted from raw text, which makes it an unsupervised approach. The unambiguous V −P −N and N1−P −N2 tuples of the training corpus TEACH the system how to resolve the attachments in the ambiguous V − N1 − P − N2 tuples of the test corpus. A graph based approach to word sense disambiguation (WSD) is used to obtain the accurate word knowledge. Further, the data sparsity problem is addressed by (i) detecting synonymy using the wordnet and (ii) doing a form of inferencing based on the matching of V s and Ns in the unambiguous patterns of V −P−NP ,NP1−P−NP2. For experimentation, Brown Corpus provides the training data andWall Street Journal Corpus the test data. The accuracy obtained for PP-attachment resolution is close to 85%. The novelty of the system lies in the flexible use of WSD and DSR phases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PP Attachment Ambiguity Resolution with Corpus-Based Pattern Distributions and Lexical Signaturese

We propose a method mixing unsupervised learning of lexical pattern frequencies with semantic information which aims at improving the resolution of PP attachment ambiguity. Using the output of a robust parser, i.e. the set of all possible attachments for a given sentence, we query the Web and obtain statistical information about the frequencies of the attachments distributions as well as lexica...

متن کامل

Improving Parsing and PP Attachment Performance with Sense Information

To date, parsers have made limited use of semantic information, but there is evidence to suggest that semantic features can enhance parse disambiguation. This paper shows that semantic classes help to obtain significant improvement in both parsing and PP attachment tasks. We devise a gold-standard senseand parse tree-annotated dataset based on the intersection of the Penn Treebank and SemCor, a...

متن کامل

Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation

Extracting information automatically from texts for database representation requires previously well-grouped phrases so that entities can be separated adequately. This problem is known as prepositional phrase (PP) attachment disambiguation. Current PP attachment disambiguation systems require an annotated treebank or they use an Internet connection to achieve a precision of more than 90%. Unfor...

متن کامل

Improving PP Attachment Disambiguation in a Rule-based Parser

This paper deals with how to enhance the performance of a rule-based parser using statistical Information. PP (Prepositional Phrase) attachment ambiguity is one of the main ambiguities found in parsing. We therefore conducted some experiments on extracting statistical information for PP attachment from a corpus, and on applying such information to a rule-based parser. Two types of information a...

متن کامل

Disambiguation of English PP Attachment using Multilingual Aligned Data

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007